Memory device search system and method

ABSTRACT

A search system and method is provided that may implemented in a content addressable memory (CAM) using various different memory technologies including SRAMs. DRAMs or Embedded DRAMs. The search system increases the density and efficiency of the CAM by using a search tree to reduce the total number of entries that must be matched against the key.

BACKGROUND OF THE INVENTION

[0001] This invention relates generally to a system and method forperforming rapid searches in a memory and in particular to a searchingmethod and system for a content addressable memory device that permitsrapid searches to be performed for data contained in the memory.

[0002] A content addressable memory (CAM) device is a memory storagedevice that accelerates any application that requires fast searches ofdata stored in the memory. For example, searching a database, a list, orfor a particular pattern in database machines, image or voicerecognition or computer and communication networks may be particularlywell suited to using a CAM. A CAM operates by simultaneously comparingthe desired information provided by the user against a list ofpre-stored entries. The CAM gives an order of magnitude reduction insearch time as compared to a typical random access memory (RAM).

[0003] A RAM is an integrated circuit that temporarily stores data. Thedata is stored in various different storage locations (addresses) andthe user may specify a particular memory location to retrieve aparticular piece of data. In other words, the user supplies the addressand receives that data back from the RAM. In contrast, a traditional CAMstores data in random memory locations wherein each memory location haslogic associated with each bit which permits comparison to a datum beingsearched for, commonly called a “KEY”. Each word of data also has a pairof status bits associated with it. The status bits keep track of whetherthe memory location has valid information or is empty and may berewritten with new information.

[0004] Thus, the CAM stores a list of data in the memory locations. Oncedata is stored in the CAM, it is found by the user specifying a desiredpiece of data. The desired piece of data is fed into a compare registerand is compared, to each bit in each memory word locationsimultaneously. If there is a match with a memory location, the addressof that memory location, commonly called the “ASSOCIATION”, is returnedto the user. In other words, with a CAM, the user may supply a desiredpiece of data or pattern and the CAM may return an address or addressesif that pattern or piece of data was located in the CAM. Thus, the CAMmay be used to rapidly compare the desired data to the list of data inthe CAM since the comparisons are done in parallel. This feature makesthe CAMs particularly suited at performing different searchingoperations. A CAM may be generated from any number of different typicalmemory device technologies including dynamic RAMs (DRAMs), static RAMs(SRAMs) or embedded DRAMs.

[0005] The key problems with typical CAMs is that compare logic, thatperforms the comparison of the desired data to each memory location inthe CAM, must be located at every memory cell location whichsignificantly increases the number of transistors that must be dedicatedto the compare logic and correspondingly decreases the amount of storagethe CAM (since fewer transistors may be used for storage) assuming afixed number of transistors on an integrated circuit. (This ratio fortraditional CAM to traditional SRAM may be calculated as at least a 3×ratio of area, due to the extra compare logic. And traditional SRAM hasapproximately a density ratio of 7×-10× to DRAM. This leads to a 21× to30× advantage for DRAM compared to traditional CAM.) In addition, thereis a large amount of power dissipation associated with every word havinga dynamic match line that cycles during every compare operation. Theseproblems severely limit the potential size of the CAM both in terms ofthe silicon area and not being able to economically package the die dueto the heat generated.

[0006] Thus, it is desirable to provide a novel search system and methodfor memory devices that overcomes the limitations and problems withtypical CAM and it is to this end that the present invention isdirected.

SUMMARY OF THE INVENTION

[0007] A new tree search architecture in accordance with the inventionis provided that is suitable for accelerating associative searches fordata stored in any memory. In a preferred embodiment, the searcharchitecture in accordance with the invention may be implemented as anew Content Addressable Memory (CAM) in accordance with the invention.The CAM in accordance with the invention may be produced using typicalcommodity dynamic random access memory (DRAM) technology process orusing a static random access memory (SRAM) technology process forsmaller, faster memory devices. In alternative embodiments of the devicein accordance with the invention, a modified DRAM technology processwith improved transistors in the branching logic for speed (typicallyknown as Embedded Dram) may be used. Thus, the invention may beimplemented using various different memory technologies including DRAM,SRAM or Embedded DRAM.

[0008] The search system and method in accordance with the inventionpermits a very large memory (suitably arranged as described below) to beaddressed as a random access memory (RAM) and then data stored in thedevice may be searched using a content addressable memory (CAM)technique. This arrangement in accordance with the invention will permitat least a twenty times (20×) density (size) increase compared to othertypical CAM memory organizations. The size/density increase inaccordance with the invention greatly increases the utility of thememory device in accordance with the invention for a broad class ofapplications ranging from pattern recognition, data sorting & look-upand Internet traffic routing. When the memory device in accordance withthe invention is used with suitable software, this architecture willgreatly speed up Internet search engines and data base servers.

[0009] The combination of a novel search method and commodity RAM inaccordance with the invention constitutes a new approach that permitsthe CAM to achieve a lower commodity cost similar to standard DRAMorganizations by eliminating match logic completely in the memory cell.Thus, standard, typical well known RAM processing technology may be usedfor producing these memory devices in accordance with the invention. Inaccordance with another aspect of the invention, portions of the RAMarrays may be configured as RAM only so that the density available asRAM when using the device in accordance with the invention is doubledcompared to it's use as a CAM, which makes the device used as a RAM moreflexible.

[0010] In more detail, the search system and method in accordance withthe invention may add additional pointers to a B⁺-tree searchalgorithm/method so that the tree structure looks like a conventionalCAM, but may be accessed by typical RAM addressing. When the method inaccordance with the invention is implemented in an efficient hardwaresolution in a preferred embodiment, a commodity priced, DRAM-density CAMis produced. In more detail, the CAM in accordance with the inventionmay include a controller/comparator and two RAM memory blocks. Thecontroller may organize the two RAM memory blocks and accesses themaccordingly to achieve the desired CAM operation. The functions inaccordance with the invention as described below may be implemented on asingle silicon die or as several silicon die in a multi-chip package.

[0011] Thus, in accordance with the invention, a memory device isprovided, comprising a main data memory for storing a plurality ofentries in the memory device and an address map and overflow data memoryfor storing an address map of the entries in the main data memorywherein the address map comprising an intended address location (IAL)and an actual physical location (APL) wherein the IAL indicates theexternal memory address of each entry and the APL indicates that actualmemory locations for each entry within the memory device. The memorydevice further comprises a controller for controlling the operation ofthe main data memory and the address map and overflow data memory usingthe IAL and APL in order to operate the memory as one or more of a CAMand a RAM and a comparator that compares each bit of an incoming pieceof data with each bit of each entry in the memory device. The controllerof the memory device further comprises search tree logic unit that sortsthrough the entries in the memory device to reduce the number ofbit-by-bit comparisons performed by the comparator.

[0012] In accordance with another aspect of the invention, a memorydevice is provided wherein the memory device comprises a main datamemory for storing a plurality of entries in the memory device and anaddress map and overflow data memory for storing an address map of theentries in the main data memory wherein the address map comprising anintended address location (IAL) and an actual physical location (APL)wherein the IAL indicates the external memory address of each entry andthe APL indicates that actual memory locations for each entry within thememory device. The memory device further comprises a controller forcontrolling the operation of the main data memory and the address mapand overflow data memory using the IAL and APL in order to store andretrieve data from the memory and a comparator that compares each bit ofan incoming piece of data with each bit of each entry in the memorydevice. The memory device further comprises search tree logic unit thatsorts through the entries in the memory device to reduce the number ofbit-by-bit comparisons performed by the comparator.

[0013] In accordance with another aspect of the invention, a memorydevice is provided wherein the memory device comprises a main datamemory for storing a plurality of entries in the memory device and anaddress map and overflow data memory for storing an address map of theentries in the main data memory wherein the address map comprises anintended address location (IAL) and an actual physical location (APL)wherein the IAL indicates the external memory address of each entry andthe APL indicates that actual memory locations for each entry within thememory device. The memory device further comprises a controller forcontrolling the operation of the main data memory and the address mapand overflow data memory using the IAL and APL in order to store andretrieve data from the memory, the controller further comprising anorganizer that organizes the memory into a plurality of bins whereineach bin comprises a plurality of sub-bins and each sub-bin comprises aplurality of entries in the memory device wherein the bins and sub-binshaving a least value and a most value associated with it that indicate aminimum value and a maximum value contained in the bin or sub-bin. Thecontroller further comprises search tree logic unit that compares anincoming piece of data to the plurality of bins based on the least andmost values to identify a bin in which the incoming piece of data islocated and that compares the incoming piece of data to the sub-binswithin the identified bin to determine the sub-bin that contains anentry matching the incoming piece of data.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014]FIG. 1 is a block diagram illustrating a content addressablememory (CAM) in accordance with the invention;

[0015]FIG. 2 is a diagram illustrating more details of the searcharchitecture in accordance with the invention of the CAM in accordancewith the invention;

[0016]FIG. 3 illustrates the silicon area advantage of the very widetree lookup in accordance with the invention;

[0017]FIG. 4 shows a block diagram of a DIMM memory architecture inaccordance with the invention;

[0018]FIG. 5 is a more detailed illustration of the use of the APL whenthe memory device in accordance with the invention is used in the RAMmode;

[0019]FIG. 6 is a more detailed illustration of the use of the IAL whenthe memory device in accordance with the invention is used in the CAMmode;

[0020]FIG. 7 is a diagram illustrating more details of the searcharchitecture in accordance with the invention of the CAM in accordancewith the invention;

[0021]FIG. 8 is an alternate view of the TREE branching data structurein accordance with the invention;

[0022]FIG. 9 is a diagram illustrating more details of the compare andcontrol logic at each bin of the device in accordance with theinvention;

[0023]FIG. 10 is a diagram illustrating the flexible multiplexing thatallows use of the memory device in accordance with the invention as aBINARY CAM or a TERNARY (MASKED) CAM; and

[0024]FIG. 11 is a diagram illustrating the insertion and search methodin accordance with the invention.

DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT

[0025] The invention is particularly applicable to a 64 Mbit contentaddressable memory (CAM) device that uses 128 Mb of DRAM and it is inthis context that the invention will be described. It will beappreciated, however, that the system and method in accordance with theinvention has greater utility, such as to other types of memory devicesthat may use other types of typical memory blocks such as SRAM orEmbedded DRAM. In addition, the invention may be implemented on a singlesilicon die or as several die in a multi-chip package. It will beappreciated that this architecture achieves the same advantages overtraditional CAM in subsequent memory density generations. Now, thepreferred embodiment of the invention implemented as a CAM will bedescribed.

[0026]FIG. 1 is a block diagram illustrating a preferred embodiment of asingle die content addressable memory (CAM) 20 in accordance with theinvention that implements the search architecture in accordance with theinvention. The CAM 20 may include a semiconductor die 22 that interfacesto other external integrated circuits (ICs). The external ICs may, forexample, supply an external address and control signals and otherexternal data to the die 22 and may receive data from the die 22 thatmay include optional match port data indicating a match has occurredbetween a location in the CAM and the data in the compare register.

[0027] The semiconductor die 22 may include a control/compare block 24,a main data RAM (MDR) 26 and an address map and overflow data RAM (AMR)28. The MDR and AMR are each separate typical RAM memory devices in thisembodiment. The control/compare block 24, that is described below inmore detail with reference to FIGS. 2-11, may control the operation ofthe CAM including storing data and performing the data comparison asdescribed below. The control/compare block 24 may also include treetraversal logic in accordance with the invention that implements thesearching method and system in accordance with the invention. The MDR 26may contain the main memory store for the CAM, may be controlled by thecontrol/compare block using an address/control bus 30, and maycommunicate data with the control/compare block and receive data over adata bus 32. Similarly, the AMR 28 may contain an address map of thecontents of the CAM and overflow data RAM, may be controlled by thecontrol/compare block using an address/control bus 34, and maycommunicate data with the control/compare block and receive data over adata bus 36.

[0028] In operation, the control/compare block 24 may organize the 2 RAMmemories (MDR and AMR) and access them appropriately to achieve thedesired CAM operation. As described above, these functions can becontained on a single silicon die or on several dies in a multi-chippackage. In the preferred embodiment shown, the MDR 26 may hold 8 Mbytesof stored RAM/CAM data. The AMR 28 may contain both the intended addresslocation (IAL) of the data stored at a corresponding physical locationin the MDR and the actual physical location (APL) of the stored data forRAM-style read queries.

[0029] In the preferred embodiment, the link structures for the datarecords of the AMR may look like: AMR_Data[63 . . . 40, 39 . . . 20, 19. . . 0]

[0030] wherein bits 40-63 contain various flags and short links, APLdata is stored in locations 20-39 and IAL data is stored in locations0-19 as described in more detail in Table 1. The structure shown aboveis for a particular preferred embodiment with a particularimplementation and the structure may be changed for differentimplementations without departing from the scope of the invention. TABLE1 Bit field meaning for AMR data for 1M*64 CAM Field Bit Name positionBrief Description IAL: [19:0] This is the destination address indicatedby the “Intended external address during a RAM write command to AddressCAM area. This is returned as part (or all) of the Location” associationmapping during the CAM operation, once a data pattern match iscompleted. This field is stored in the AMR at the “same” (or simplymapped) address as the Data in the MDR. APL: [39:20] During a RAM readto the CAM area, this is “Actual fetched first and used as the addressfor the MDR to Physical fetch data. This implies that RAM reads areLocation” generally Random Accesses to MDR. This is generally true fordatabase management tasks, until an actual table is being fetched. Thisfield is stored at the address pointed to by the IAL, that is, thelocation where the data would have been stored in a regular RAM. LINKS/ 63:40 This is dependent on implementation details. flags:

[0031] In accordance with the invention, the 2 DRAM blocks (MDR and AMR)may also be available as very fast ordinary RAM in which case theController/Comparer 24 may configure the CAM to allocate anywhere from0-100% of the DRAM memory locations to the CAM and the remainder to theRAM operation. Even with the allocation of memory locations, the systemstill permits RAM-style accesses to the part being used (mapped) to theCAM operation. For the memory locations being used for strictly RAMoperations typical full speed burst operations may be available. Thisallows the CAM to be used in DIMM sockets in servers that permits aneasy upgrade path for use in list processing and data manipulation tasksin servers. Now, details of the search architecture and method inaccordance with the invention will be described.

[0032]FIG. 2 is a diagram illustrating the searching architecture 40 inaccordance with the invention that permits a more rapid searching of thecontents of the CAM in accordance with the invention. In accordance withthe invention, a very wide search tree as described below may be used inorder to converge on a data match in a tree structure rapidly. A verywide search tree is also more economical with branching between 64 and1024 ways at each level, depending on the size of the ultimate DRAM thatcontains the leaves. In this preferred embodiment of a 1M*64 CAMarchitecture, there is a 2 level B-tree structure that finds an indexinto a final “bin” or “leaf” which contains 64 entries in a DRAM. The 64entries may then be fetched by address (i.e., the index is retrievedfrom the b-tree structure) and compared against the key so that thecomparison occurs with only the 64 entries instead of all of the entrieswhich significantly reduces the comparison time of the CAM in accordancewith the invention. In the architecture, note that there is no“CAM-cell” memory structure in the large memory blocks, only SRAM andDRAM memory cells.

[0033] Returning to FIG. 2, the architecture 40 may receive input data(a “key”) that may be 64 bits in the example of the preferredembodiment. In accordance with the invention, the key may be fed into a256 way compare and branch logic 42 that compares the key to each of 256groups of the memory to generate a single pointer to the next branchlevel. The pointer generated by this logic 42 may be fed into a 64 waycompare and branch logic 44 which also is fed the key. This logic 44 mayagain compare the key to each of 64 groups within the selected groupfrom the original 256 to generate a single selected memory pointer to ablock of memory. In this manner, the number of full memory locationsthat are compared to the entire key is rapidly reduced so that the finalcomparison of the full key to memory locations may be completed rapidly.The structure of the compare and branch logic 42, 44 is furtherillustrated in FIG. 7.

[0034] The output of the second compare and branch logic 44 (the addressof a small group of memory in the CAM) is fed into multiplexer 46. TheAPL signal from the AMR (during random access reads to the CAM) and aread/write address (the memory address for non-CAM random access readsor writes) may also be input into the multiplexer so that the output ofthe multiplexer is the address of a bin so that the MDR may functionlike a CAM and return an address of a matching memory location or mayfunction like a RAM. During CAM operation, the multiplexer may outputthe DRAM address for a matching entry (memory location) in the CAM fromthe tree. In more detail, the DRAM address may be 1 of over 1 millionentries (256×64×64 in this example) wherein the entry is located in oneof 16,384 different memory bins as selected by the two compare andbranch logic circuits 42, 44 as shown in FIG. 2. The actual number ofbins and entries varies with different embodiments and depends on theactual branches performed by each circuit 42,44. In this example, eachbin (selected by the two logic circuits 42, 44) may contain up to 6464-bit entries that may be examined for a match. Thus, in this preferredembodiment, instead of matching the key against over a million entries,the key may be matched against 64 entries which significantly reducesthe time required to perform the comparison compared to the timerequired for a sequential search of the DRAM and significantly reducesthe circuitry required to perform the match compared to the circuitryrequired in a traditional CAM (by a factor of a constant multiple of16384 in this instance or, in general by a factor which is a constantmultiple of the total memory/branch bin size ).

[0035] The advantages of the wide tree structure are three fold. First,the ratio of storage in the tree is very low (in terms of number ofbits) in relationship to the final data storage since the comparisons ateach level can be performed in parallel across 4-64K bits of comparatorlogic for speed. This is illustrated by FIG. 3, which shows the leverageobtained by viewing the relative area of the tree silicon resources. Inparticular, FIG. 3 illustrates that the memory device in accordance withthe invention may be used 128 Mbits of DRAM, 64 Mbits of binary CAM or32 Mbits of ternary CAM with 32 Mbits of associated data. In addition,the generated heat per bit examined is reduced as compared to thetoggling of the match line for every word in a comparable sizetraditional CAM. This makes it possible to achieve a CAM with state ofthe art DRAM density at a small multiple of the cost of commodity DRAMs.

[0036] An example of a 64 Mbits binary/ternary CAM is shown. The CAM mayinclude various elements that will be briefly described. The sizes ofthe boxes shown in FIG. 3 represent the actual size of the respectiveelements in silicon area when incorporated into a memory device. The CAMmay include a root element 200 that may be 48 Kbits of SRAM with comparelogic per word. The CAM may also include an SRAM array 202 with 3 Mbitsof SRAM (it could also be DRAM and have a smaller size) that containsthe 2^(nd) level memory bins in accordance with the invention. The CAMmay also include compare logic 204 with 4 Kbits of compare, mask androuting logic. The CAM may also include leaf memory 206 that may be aDRAM array with 128 Mbits of DRAM. Finally, the CAM may include a secondlogic layer 208 that may include 4 Kbits of compare, mask and routinglogic.

[0037] This may result in a specialty DIMM module that may be configuredas SD-100 DRAM or alternatively, a portion or all of the memory could beconfigured as CAM which would be about ½ the density as when configuredas a DRAM only. These DIMMs could be used by a CPU to speed upassociative searches and sorting, but still be useful when configured asordinary DRAM. In addition, the portion configured as CAM memory couldstill be conveniently accessed as RAM. A vast array of data base taskscould be sped up, effectively multiplying the value of a servercontaining the special DIMMs.

[0038] A physical block diagram of the arrangement of the DIMM 218 inaccordance with the invention is shown in FIG. 4. The diagram shows aseparate ASIC 220 that performs the interface to the memory bus, andalso contains the tree logic from FIGS. 2 and 7. MDR DRAMs 222 and AMRDRAMs 224 are shown as separate DRAMS, which may be either standardDRAMS, in a slower version, or specially designed DRAMS that optimizethe tree search. Preferably, the MDR and AMR may be low latency, wideinterface specialty DRAMs in the multi-die module and may each containthe leaf nodes of the search tree. The DIMM may power up looking like atypical JDEC DRAM and then be configured as a CAM through software. Now,the operation of the memory device as RAM using the APL address and asCAM using the IAL address will be described in more detail.

[0039] Referring to FIG. 5, during operation as a RAM, each leaf entry240 may also contain a pointer 242 to where the data that would havebeen stored in an ordinary RAM actually was stored in during the sortingprocess (the APL as described above). Similarly, referring to FIG. 6,during operation as a CAM, each entry in the leaves of the tree 240 mayhave associated with it an address 246 where the program “thought” itwas storing the data (the “association address” or “mapping address”)when it wrote the data into the CAM area (the IAL described aboveperforms this function). When reading the CAM area as a RAM, the inputaddress will fetch this APL pointer to find the “RAM” data. That is, asdata is written into the CAM area, it is sorted into bins with datawhich is “close” in magnitude as described below, and a pointer to theactual physical location of the key data is stored at the physicaladdress that will be pointed to when attempting to retrieve data as aRAM. In many versions of the invention, the APL portion of the AMR willbe able to be accessed separately in time (i.e. in a pipeline) from theaccess of the data portion (in the MDR) & IAL portion of the AMR. Thiswill prevent accesses to the APL from blocking accesses to the MDR.FIGS. 5 and 6 therefore show the logical grouping of the entries as amemory entry, but physically they are likely to be separate physicalblocks.

[0040] Returning to FIG. 2, each branch in the tree has an associatedKey value that defines the least bounding value for the subsequentbranches or leaves underneath that branch and the address pointer to thenext node in the tree, or the final leaf or “bin” of data. The methodfor inserting the entries into the tree may attempt to keep the numberof branches at each level to less than ½ the maximum until allsubsequent levels in the tree are similarly filled to at least ½capacity. This insertion method should leave plenty of room to insertdata into each bin without excessive collisions until the memory is morethan {fraction (63/64)}ths full (i.e., 64=the # of elements in a bin.).A description of the corner case where the memory is “almost full” isprovided below in connection with an insertion and smoothing method inaccordance with the invention.

[0041] In operation, since SRAM access speeds of much less than 10 nSare now possible, each branch in the 2 level tree shown in FIG. 2 may betraversed in on the order of 10-15 nS. With state-of-the-art DRAMstorage and a single die implementation, the 64 entries per bin (in theembodiment shown in FIG. 2) should be accessible in 20 nS as a 64*64 or4 Kbit entity. This speed implies that pipelined operation of thebranches and lookup for DRAM versions should run at 50 Mhz (faster ifdifferent banks are accessed), or over 20 MHz in a non-pipelined mode.Now, the hardware that may be used to implement the search treearchitecture shown in FIG. 2 will be described in more detail.

[0042]FIG. 7 is a diagram illustrating a preferred hardwareimplementation of the search architecture 40 that includes the firstbranch logic 42, the second branch logic 44 and the comparator/DRAMaddress register 46. In more detail, the search architecture may includea set of registers 50 that store the AMR data and thus include two extrabits in the links that are the status bits indicating an active branchor not. This register memory, combined with ALU 52 may be organized as asmall special CAM, with SRAM cells for memory instead of registers.

[0043] A comparison of the 64-bit key and the branch data from theregister 50 is performed in 52. Each branch value from 50 is comparedfor greater than or equal to the key. The results of the comparison arepriority encoded based on the possible 256 branches at this level of thetree (with larger branch number having higher priority). The status bitssuppress inactive branches from participating in the comparison. Theoutput of the ALU may be fed into a multiplexer 54 that selects the 8bits pointer corresponding to the highest branch that compared greaterthan or =. The output of the multiplexer is a selection of one of the256 bins at this level and its associated address. The output of themultiplexer may be stored in a SRAM address register 56 that may be8-bits in size in this embodiment. The address stored in the registermay be used to retrieve data from an SRAM 58.

[0044] The output from the SRAM may then be fed into the second branchlogic 44 along with the key. The branch logic 44 may further include anALU 60 that performs priority encoding based on the 64 branches at thislevel and outputs the resulting data. The resulting priority encodeddata and the data from the SRAM may be then fed into a multiplexer 62.The output of the multiplexer 62 is the address of the least entry of a64 entry bin and the address may be stored in the DRAM address register46 so that the DRAM address may be output.

[0045] The above embodiment is merely an example of a device that may beimplemented in accordance with the invention. For example, the “N” ineach N-way branching logic is clearly flexible and can be tailored tofit the needs of the target DRAM memory and the ultimate size of theDRAM array. Thus, some implementations might make the branching numberlower or higher than indicated here.

[0046] In some embodiments, the multiplexers & associated SRAM bits (8 &20 respectively) will be replaced with simpler and smaller logic thatsimply encodes the output of the priority encoder into an 8 or 20 bit(16 bits plus 4 trailing 0 bits to define a bin) value, eliminating alevel of indirection. This may be acceptable in many cases, and willhave superior area efficiency.

[0047] In the embodiment shown above, a “Nearest search” closeness basedon 2-s compliment size is clearly very robust in this scheme. Once a keyhas found the best candidate bin, if an exact match was not present, theentries in that bin could be examined to find which was closest. Thiscould either be accomplished by examining all entries in parallel, or inthe case where the entries in a bin have links (6 bits in this case of a64 entry bin) which indicate the ordering of the entries, performing abinary partition search to find between which 2 entries the key falls.

[0048] In accordance with the invention, it is possible to arrange theCAM circuitry in accordance with the invention to perform 128 bit CAMoperations, or any other desired size, by additional pipeline stages inthe ALU operation or by running the branch stages at a slower rate ifthat is required. This may also be configurable based on a status bit.In accordance with the invention, the efficiency of this searcharchitecture improves as the data match (key) gets bigger since theoverhead of the AMR becomes a smaller percentage of the total memoryspace. In addition, by using the association address (the address wheredata is stored—the IML) as a further pointer to data stored in theportion configured as conventional DRAM, the efficiency of thearchitecture is improved even further.

[0049] The memory in the branches will be DRAM in many embodiments orthe final “look up (leaves) bins” could conceivably also be SRAM. Thisdisclosure is anticipated to be the preferred way. Also, the detailedmemory architecture below is not required for the basic algorithm towork, albeit with less speed or energy efficiency.

[0050] The invention may be used for a variety of applications where thespeed increases due to the search system and method is particularlybeneficial. For example, the invention may be used for image processing,pattern recognition, data base mining applications, artificial learning,image recognition (satellites, etc), IP address routing and lookup, androuting statistics for networking applications and voice recognitionboth in mobile/desktop computers. In addition, DIMMs in accordance withthe invention as described above may be used in server farms and centraloffice for international language translation uses and URL matching.Further, the invention may be used for disk/database caching,multi-media applications(e.g., compression algorithms) and scientificsimulations.

[0051]FIG. 8 is a diagram illustrating an example of the basic datastructures in the branches and bins of the memory device in accordancewith the invention. The diagram illustrates a first level of bins 250, asecond level of bins 260 and a third level of bins 270. As describedabove, the first level of bins defines 256 super bins which each contain64 bins themselves of 64 entries each. The second level of bins 260 maybe selected by a first level of bins and each second level bin maycontain 16K bins of 64 entries each. The second level of bins 260 eachpoint to a set of 64 entries that may then be compared to the key asdescribed above. Thus, using the search tree in accordance with theinvention, the memory device rapidly searches through 1 million 64 butentries. Now, the control for each bin in the CAM in accordance with theinvention will be described in more detail.

[0052]FIG. 9 is a diagram illustrating the control and compare logic 100for the memory banks in accordance with the invention. The control andcompare logic 100 may include global address and data bus 102 thatprovides communication between the search architecture and the localcontrol and compare logic, a local compare and match logic 104 for eachbank of memory and one or more banks of memory 106 that each contain oneor more bins of memory 108. As shown, the multiple local match andcompare logic may be interconnected together by a bus 105 and in fact asingle match and compare logic may be shared by one or more banks ofmemory for better layout efficient on the semiconductor die, or eachbank has its own local match and compare logic which may have an MDRsignal and an AMR signal that are connected to the memory bankcontrolled by that local match and compare logic. In the embodimentshown in FIG. 9, the MRD and AMR signals may be 4096 bits wide, eachbank may be 16 Mbits with seven banks and the total number of 64 entrybins may be 16383.

[0053] In operation, the final bins of data in the DRAM may be stored inorder of receipt, i.e. unsorted for size. Then, the finalcomparison/prioritizing operation may find the closest (or exact) matchby comparing all 64 entries in a bin in parallel as shown in FIG. 9.This conserves most of the power and time to move data items into asorted order in the DRAM. When the bins get full and the most or leastentry needs to be moved to the next bin over, extra sorting processingand energy must be used. It is important to note that FIG. 9 is not asemiconductor floor plan, but a logic organization diagram for purposesof illustration and there may be many floor plans or organizations thatare encompassed by the invention. Further, for a particularimplementation, the actual bitline and wordline lengths will be dictatedby feasible multiplexing and sensing ratios in a particular DRAMtechnology generation.

[0054] Returning to the example shown in FIG. 9, the memory may beorganized into two 64 Mbit blocks of eight 8 Mbit banks each. Each bankmay include a very wide organization of (512 deep) bitlines, 16k widewordlines (rows) multiplexed to produce 4k data out, giving a “column”address of 2 bits to differentiate the bin within a row of 4 bins. Oneblock may be for the MDR and one for the AMR. In the example shown,these blocks are physically organized such that bank* of MDR is next tobank* of the AMR. The mapping of the 4 bin rows into the banks isflexible because the pointer from the level 2 branching can reach any 64bit memory location to define the bin in question. Now, severaldifferent bin mapping techniques will be discussed.

[0055] One technique to map the memory bins is that the next row in abank contains bins 4 away from the current ones. Thus, the same row inseparate banks represents bins 4* bitline depth or 1K away. This letsthe “bins” on “either side” of the bin addressed from the B-tree beaccessed for quick splitting of leaves in the tree. This mappingminimizes the energy used to “smooth” bin utilization out so that binsalways have storage available for quick insertion and also maximizes thechance that smoothing can run as a background task. (In other words, abank that is not being accessed can easily move data to neighboring binsas a background task to equalize the entries in each bin, withoutrequiring use of a global bus to move data around except for connectingthe ends of banks to each other.

[0056] As an alternative mapping technique, an organization that wouldhave contiguous groups of 4 bins may be located in neighboring banks.This would allow the “least” and/or “most” entry in a bin to bereallocated to the neighboring bins without an additional memory cyclesince the neighbor could be opened simultaneously since it would be in aseparate bank. Neighboring banks may also share a local bus connectionto facilitate this transfer that would not require and use the globalbus resources, keeping energy dissipation down.

[0057] Another alternative mapping technique is to map the bins in acoarse “checker board” within each bank with gaps in each bank where thenext bins are in a neighboring bank. This technique is shown in FIG. 9.With that checkerboard organization, the CAM will be able to quickly“open up” large bin areas for totally new data patterns that don't fitwithin the existing range of data branch values that are sorted. Inother words, one “square” of the checker board could be used as asmaller version of the CAM so that any systematic time variation in thedata patterns won't “saturate” the comparison values for the entire CAMinto a small region of the possible 64 bit values. Since the gaps arecoarse, the energy/time to “connect” neighboring bins in different banksat the “borders of the checker board” is not dramatically increased. Themost flexible search architecture in accordance with the invention mayhave several tree to bin “mapping” algorithms resident for the user tochoose based on expected usage, with the best “all around” algorithm asa default.

[0058] In accordance with the invention, it should be noted that bulk“moves” of bins containing N data items implies touching up to N APLlinks in the AMR since the data will have moved even though it is in thesame “location” (IML) to the outside user. When a CAM area is being usedsimply to do sorting or merging of 2 files, the APL can be turned offfor faster reorganization of bins. That is, the CAM areas can be onlycontent addressable after written for some uses. Now, more details ofthe local compare and control logic will be described.

[0059]FIG. 10 is a diagram illustrating more details of the compareportion of the logic 104 of FIG. 9, and illustrating the advantages thisinvention attains in flexible usage of the memory array compared totraditional CAM. A single bit of 2 entries in a bin are shown. Unlikethe tree comparisons, the leaf comparison is a simple equality to find amatch, since no further reduction of the search space is required. Alsoshown is circuitry to turn Entry X+1 into a mask for data entry X. Ifthe Binary/Ternary signal is set to “binary”, the AND gate is turnedoff, and both entry X & entry X+1 participate in comparison against thekey. If Binary/Ternary signal is set to “ternary”, the and gate isturned on, and a 1 bit in Data Entry X+1 forces a true comparison. Thismeans the Data Entry X+1 is acting as a comparison mask for data entryX. During this operation, the comparison for entry X+1 is suppressed.Note that the IML & APL for the X+1 entry become free to be part of thereturned association as well.

[0060] This comparison circuitry represents a small percentage of thetotal transistors in the invention, since the comparison logic per bitis shared by all memory bits in a bitline of the memory, unlike atraditional CAM, in which this circuitry would be replicated at everybit. In comparison, it can readily be appreciated by one skilled in theart, that adding the binary/ternary control & gating at every bit of amemory to achieve the ability to program a block to perform eitherbinary or ternary compare operations would be an uneconomically largeburden for the traditional CAM. In a traditional CAM capable of ternarycomparisons, binary comparison is achieved by programming the mask Off,wasting that storage for other uses. The invention allows that storageto be economically recovered for use as Data in a binary compareoperation. So, as a unique advantage of this architecture, it ispossible to program all or portions of the CAM to perform binary ORternary comparisons, with the binary areas being twice as large innumber of data bits as the ternary memory areas. Now, the insertionmethod in accordance with the invention for inserting entries into theCAM will be described.

[0061] There are many candidate insertion algorithms ranging from greedyalgorithms that use new bins whenever neighboring bins are ½ full toalgorithms that reserve 1 of every 4 bins to accommodate data“clustering” that changes in time and may burst many closely related newvalues that aren't “close” to previous data. As an example, IP addressesfor the Internet may exhibit this behavior. However, any insertionalgorithm needs to be robust for the corner case when the CAM configuredarea is operated with many bins full and there are only 1 or 2 openingsleft in remaining bins. While many uses for CAM require only “slow”insertion rates, excessive “re-arranging” of data can still lead tounacceptable collisions with accesses during the matching operations. Inaddition, the energy inefficiency of a write (insertion) operation risesdramatically.

[0062] Using the organization shown above in the figures, insertionrates should stay very high until each insertion is trying to overfill abin with neighbors several n*4 away already full (e.g, for the casewhere the memory is more than {fraction (63/64)} full if the smoothingalgorithm has successfully kept the distribution even.) In the case thatthe CAM is so close to full that smoothing would take excessive time,the CAM may preferably use the storage in the AMR to store one or moreoverflow entries for that bin. In the example shown in FIGS. 1 and 2, if2 bits per AMR entry are reserved for this, then there is enough storagefor an overflow entry per bin (2*64.).

[0063] All of the preceding discussion about mapping are aimed atoptimally avoiding the case where the entry forces all bins between thetarget bin and a bin with an available location to re-organize theirleast & most entry pointers. If that occurs at the same time that a newentry write attempt to the same bin occurs, that will cause a stall ofthe insertion process. This stall situation can not be avoided by anyinsertion algorithm if the CAM area is run continually in an “almostfull” state. However, the “fullness” that causes the increase in theinsertion times occurs at a much higher level than software hashingalgorithms in the literature due to the ability to use the local compareand match logic for each bank to “invisibly” move entries transparentlyto the match & insertion process. It is important to note that thisproblem may be avoided since the user of the CAM may simply not operatethe CAM area in an “almost full” state, or if that is unacceptable,sufficient “invisible memory” can be added to allow “buffer bins” to bemapped in to handle overflow situations. This mapping is the intendeduse of the indirection in the tree. Instead of moving entries out of abin in bulk, with the required updates of APL pointers, a new bin can beselected by the 20 bit 2^(nd) level SRAM pointer, and a bin of memorythat has been held in reserve can be mapped “in between” existing binsin an area of the tree that is congested. This is a unique advantage ofthe invention, since table management in traditional cams, especiallyternary cams that use the cam address as a match priority, can be veryonerous. The invention allows easy and flexible allocation of leaf binsaccording to need.

[0064] The invention allows for an extension to find set intersection ofsets. If a datum belongs to several larger databases, the same data(masked or not) may be stored in several different addresses (i.e., withseveral different associations), however all the entries will be groupedtogether in the same or contiguous bins. Then, when a CAM operation(data search) takes place, all of the different data bases that containthat datum can be identified quickly, as they will all return match.This feature is a very important feature which can implement setintersection calculations for data bases that are resident in the CAM.For example, say that the data base for “U.S. cities” and the data basefor “animals” are both resident in the CAM. When the data record“buffalo” is presented to the CAM, it will return the address(association) for both the U.S. city and the animal. Traditional CAM canalso identify all the data items that match a key, and return them all,but traditional CAM is to small to hold entire databases, and the timeto page in and out of the CAM overwhelm the advantage of the fastcomparison once data is in the CAM. The invention is LARGE enough sothat multiple instances can hold a database and avoid the time to swapthe databases in and out. Further, traditional CAMs do not allow “almostthe same” set comparisons which can be performed by the invention (dueto grouping of like entries), thus the invention allows “fuzzy” setintersection on a very large memory.

[0065] In addition, the CAM architecture may also perform functions likemerging and sorting lists just by writing data into the memory. This isvery powerful again since it occurs in a large enough memory to bemeaningful for data base tasks. Sorting is a natural feature of theinvention, and traditional CAMs CAN NOT sort data, since they onlyperform equality comparison and return yes/no match answers. After adatabase was entered into the invention, the memory could be scanned outin bin order, and the database would automatically be sorted. So thisconstitutes another unique advantage of the invention.

[0066] A smoothing method may be used to move around the entries in theCAM so that each bin is approximately equally filled which reducescollisions. There are many possible smoothing methods that may be used.A preferred method involves having the local compare and match circuitfor each bank move items out of “too full” bins (as defined by a userdefined variable) and into neighboring bins until the bins are filled towithin some delta variable (again as defined by the user). Thissmoothing may be done as a background task as banks are sitting idle. Byallowing the local compare and match to move more than one item at atime from bin to bin, the smoothing method is able to keep ahead of theinsertion algorithm for all except the most extreme cases of an almostfull memory (as discussed above). Any smoothing method implies movingdata that is already stored and therefore, extra energy dissipation.Disabling the movement of data until required to “split a leaf bin” (ormove the extra items from full bins that are stored to) will minimizethe energy consumption of the memory. Now, a method for identifying theentries in a bin with the least value and the most value will now bedescribed.

[0067] In order to achieve the above system, it is desirable to providean efficient method for determining the entries with the least value andthe most value in each bin so that these values may be used by thesearch tree during the tree search. In a preferred embodiment, either a6-bit field can be reserved in the AMR entry to identify the ordering ofelements (and hence the least value and the most value) or theinformation may be regenerated with each access to a bin. Thedetermining method is implementation dependent based on possible sharingof the local compare and match circuitry amongst banks.

[0068] In accordance with the invention, it is also possible to combineone or more CAM in accordance with the invention together to produce alarger capacity CAM. In particular, there are many well know ways toassert a signal common to several parts (e.g., open collector, or simplyORing and then polling for which is asserted). In accordance with theinvention, the CAMs may either be treated as a common pool of memorywith each part creating it's own local tree, or a special ASIC couldtreat each part as a branch in a still larger tree that it creates.

[0069]FIG. 11 shows an example each of insertion & search. The insertionselects the largest branch which the new entry is larger than or equalto, and gets written to any empty entry location. In FIG. 11, 0386 islarger than 0314 and less than 0476, so root branch 2 is taken forinsertion. It is larger than 037E and less than 0410, so Bin 61 ischosen for insertion. Entry 61 is empty, so the insertion can occurthere. If it is the new least entry, it gets tagged as such and the(optional) 6 bit ordering pointer gets set to point to the previousleast entry. Similarly, searches for 038F follow the branches down forthe same comparison reasons, and the entry 3 is a match. Notice onceagain that the entries in the bin are not sorted into physically orderedlocations.

[0070] While the foregoing has been with reference to a particularembodiment of the invention, it will be appreciated by those skilled inthe art that changes in this embodiment may be made without departingfrom the principles and spirit of the invention, the scope of which isdefined by the appended claims.

1. A memory device, comprising: a main data memory for storing aplurality of entries in the memory device; an address map and overflowdata memory for storing an address map of the entries in the main datamemory, the address map comprising an intended address location (IAL)and an actual physical location (APL) wherein the IAL indicates theexternal memory address of each entry and the APL indicates that actualmemory locations for each entry within the memory device; a controllerfor controlling the operation of the main data memory and the addressmap and overflow data memory using the IAL and APL in order to operatethe memory as one or more of a CAM and a RAM; a comparator that compareseach bit of an incoming piece of data with each bit of each entry in thememory device; and the controller further comprising search tree logicunit that sorts through the entries in the memory device to reduce thenumber of bit-by-bit comparisons performed by the comparator:
 2. Thedevice of claim 1, wherein the search tree logic unit further comprisesa first compare and branch logic unit that compares the incoming pieceof data to one or more memory bins to determine the bin in which the keyis located, each bin comprising a plurality of memory locations whereinthe bin has a least value and a most value indicating the range of entryvalues in the memory locations encompassed by the bin so that thecompare and branch logic unit compares the incoming piece of data to theleast and most values for each bin simultaneously to generate a selectedbin.
 3. The device of claim 2, wherein the search tree logic unitfurther comprises a second compare and branch logic unit that comparesthe incoming piece of data to the entries in one or more sub-bins in thebin selected by the first branch and compare logic unit, each sub-bincomprising a plurality of memory locations wherein the sub-bin has aleast value and a most value indicating the range of entry values in thememory locations encompassed by the sub-bin so that the second compareand branch logic unit compares the incoming piece of data to the leastand most values for each sub-bin contained in the selected binsimultaneously to generate a selected sub-bin.
 4. The device of claim 3,wherein the comparator compares each bit in the incoming piece of datawith each bit in the entries contained in the selected sub-bin in orderto determine if a match has occurred between the entries in the memorydevice and the incoming piece of data.
 5. A memory device, comprising: amain data memory for storing a plurality of entries in the memorydevice; an address map and overflow data memory for storing an addressmap of the entries in the main data memory, the address map comprisingan intended address location (IAL) and an actual physical location (APL)wherein the IAL indicates the external memory address of each entry andthe APL indicates that actual memory locations for each entry within thememory device; a controller for controlling the operation of the maindata memory and the address map and overflow data memory using the IALand APL in order to store and retrieve data from the memory; acomparator that compares each bit of an incoming piece of data with eachbit of each entry in the memory device; and the controller furthercomprising search tree logic unit that sorts through the entries in thememory device to reduce the number of bit-by-bit comparisons performedby the comparator.
 6. The device of claim 5, wherein the search treelogic unit further comprises a first compare and branch logic unit thatcompares the incoming piece of data to one or more memory bins todetermine the bin in which the key is located, each bin comprising aplurality of memory locations wherein the bin has a least value and amost value indicating the range of entry values in the memory locationsencompassed by the bin so that the compare and branch logic unitcompares the incoming piece of data to the least and most values foreach bin simultaneously to generate a selected bin.
 7. The device ofclaim 6, wherein the search tree logic unit further comprises a secondcompare and branch logic unit that compares the incoming piece of datato the entries in one or more sub-bins in the bin selected by the firstbranch and compare logic unit, each sub-bin comprising a plurality ofmemory locations wherein the sub-bin has a least value and a most valueindicating the range of entry values in the memory locations encompassedby the sub-bin so that the second compare and branch logic unit comparesthe incoming piece of data to the least and most values for each sub-bincontained in the selected bin simultaneously to generate a selectedsub-bin.
 8. The device of claim 7, wherein the comparator compares eachbit in the incoming piece of data with each bit in the entries containedin the selected sub-bin in order to determine if a match has occurredbetween the entries in the memory device and the incoming piece of data.9. A memory device, comprising: a main data memory for storing aplurality of entries in the memory device; an address map and overflowdata memory for storing an address map of the entries in the main datamemory, the address map comprising an intended address location (IAL)and an actual physical location (APL) wherein the IAL indicates theexternal memory address of each entry and the APL indicates that actualmemory locations for each entry within the memory device; and acontroller for controlling the operation of the main data memory and theaddress map and overflow data memory using the IAL and APL in order tostore and retrieve data from the memory, the controller furthercomprising an organizer that organizes the memory into a plurality ofbins wherein each bin comprises a plurality of sub-bins and each sub-bincomprises a plurality of entries in the memory device, the bins andsub-bins having a least value and a most value associated with it thatindicate a minimum value and a maximum value contained in the bin orsub-bin; the controller further comprising a search tree logic unit thatcompares an incoming piece of data to the plurality of bins based on theleast and most values to identify a bin in which the incoming piece ofdata is located and that compares the incoming piece of data to thesub-bins within the identified bin to determine the sub-bin thatcontains an entry matching the incoming piece of data.